在分支机构和结合中得出良好的可变选择策略对于现代混合编程(MIP)求解器的效率至关重要。通过在先前的解决方案过程中收集的MIP分支数据,学习分支方法最近变得比启发式方法更好。由于分支机构自然是一项顺序决策任务,因此应该学会优化整个MIP求解过程的实用性,而不是在每个步骤上都是近视。在这项工作中,我们将学习作为离线增强学习(RL)问题进行分支,并提出了一种长期视线的混合搜索方案来构建离线MIP数据集,该数据集对分支决策的长期实用程序。在政策培训阶段,我们部署了基于排名的奖励分配计划,以将有希望的样本与长期或短期视图区分开,并通过离线政策学习训练名为分支排名的分支模型。合成MIP基准和现实世界任务的实验表明,与广泛使用的启发式方法和基于先进的学习分支模型相比,分支rankink更有效,更健壮,并且可以更好地概括为MIP实例的大型MIP实例。
translated by 谷歌翻译
透明对象对视觉感知系统提出了多个不同的挑战。首先,他们缺乏区分视觉特征使透明对象比不透明的对象更难检测和本地化。即使人类也发现某些透明的表面几乎没有镜面反射或折射,例如玻璃门,难以感知。第二个挑战是,通常用于不透明对象感知的常见深度传感器由于其独特的反射特性而无法对透明对象进行准确的深度测量。由于这些挑战,我们观察到,同一类别(例如杯子)内的透明对象实例看起来与彼此相似,而不是同一类别的普通不透明对象。鉴于此观察结果,本文着手探讨类别级透明对象姿势估计的可能性,而不是实例级姿势估计。我们提出了TransNet,这是一种两阶段的管道,该管道学会使用局部深度完成和表面正常估计来估计类别级别的透明对象姿势。在最近的大规模透明对象数据集中,根据姿势估计精度评估了TransNet,并将其与最先进的类别级别姿势估计方法进行了比较。该比较的结果表明,TransNet可以提高透明对象的姿势估计准确性,并从随附的消融研究中提高了关键发现,这表明未来的方向改善了绩效。
translated by 谷歌翻译
用于对象检测的常规知识蒸馏(KD)方法主要集中于同质的教师学生探测器。但是,用于部署的轻质检测器的设计通常与高容量探测器显着不同。因此,我们研究了异构教师对之间的KD,以进行广泛的应用。我们观察到,异质KD(异核KD)的核心难度是由于不同优化的方式而导致异质探测器的主链特征之间的显着语义差距。常规的同质KD(HOMO-KD)方法遭受了这种差距的影响,并且很难直接获得异性KD的令人满意的性能。在本文中,我们提出了异助剂蒸馏(Head)框架,利用异质检测头作为助手来指导学生探测器的优化以减少此间隙。在头上,助手是一个额外的探测头,其建筑与学生骨干的老师负责人同质。因此,将异源KD转变为同性恋,从而可以从老师到学生的有效知识转移。此外,当训练有素的教师探测器不可用时,我们将头部扩展到一个无教师的头(TF-Head)框架。与当前检测KD方法相比,我们的方法已取得了显着改善。例如,在MS-COCO数据集上,TF-Head帮助R18视网膜实现33.9 MAP(+2.2),而Head将极限进一步推到36.2 MAP(+4.5)。
translated by 谷歌翻译
几乎所有的多代理强化学习算法没有交流,都遵循分散执行的集中培训原则。在集中培训期间,代理可以以相同的信号为指导,例如全球国家。但是,在分散执行期间,代理缺乏共享信号。受到观点不变性和对比学习的启发,我们在本文中提出了共识学习,以学习合作的多代理增强学习。尽管基于局部观察结果,但不同的代理可以在离散空间中推断出相同的共识。在分散执行期间,我们将推断的共识作为对代理网络的明确输入提供了,从而发展了他们的合作精神。我们提出的方法可以扩展到具有小模型更改的各种多代理增强学习算法。此外,我们执行一些完全合作的任务,并获得令人信服的结果。
translated by 谷歌翻译
透明的物体在家庭环境中无处不在,并且对视觉传感和感知系统构成了不同的挑战。透明物体的光学特性使常规的3D传感器仅对物体深度和姿势估计不可靠。这些挑战是由重点关注现实世界中透明对象的大规模RGB深度数据集突出了这些挑战。在这项工作中,我们为名为ClearPose的大规模现实世界RGB深度透明对象数据集提供了一个用于分割,场景级深度完成和以对象为中心的姿势估计任务的基准数据集。 ClearPose数据集包含超过350K标记的现实世界RGB深度框架和5M实例注释,涵盖了63个家用对象。该数据集包括在各种照明和遮挡条件下在日常生活中常用的对象类别,以及具有挑战性的测试场景,例如不透明或半透明物体的遮挡病例,非平面取向,液体的存在等。 - 艺术深度完成和对象构成清晰度上的深神经网络。数据集和基准源代码可在https://github.com/opipari/clearpose上获得。
translated by 谷歌翻译
视觉感知任务通常需要大量的标记数据,包括3D姿势和图像空间分割掩码。创建此类培训数据集的过程可能很难或耗时,可以扩展到一般使用的功效。考虑对刚性对象的姿势估计的任务。在大型公共数据集中接受培训时,基于神经网络的深层方法表现出良好的性能。但是,将这些网络调整为其他新颖对象,或针对不同环境的现有模型进行微调,需要大量的时间投资才能产生新标记的实例。为此,我们提出了ProgressLabeller作为一种方法,以更有效地以可扩展的方式从彩色图像序列中生成大量的6D姿势训练数据。 ProgressLabeller还旨在支持透明或半透明的对象,以深度密集重建的先前方法将失败。我们通过快速创建一个超过1M样品的数据集来证明ProgressLabeller的有效性,我们将其微调一个最先进的姿势估计网络,以显着提高下游机器人的抓地力。 ProgressLabeller是https://github.com/huijiezh/progresslabeller的开放源代码。
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
Supervised Question Answering systems (QA systems) rely on domain-specific human-labeled data for training. Unsupervised QA systems generate their own question-answer training pairs, typically using secondary knowledge sources to achieve this outcome. Our approach (called PIE-QG) uses Open Information Extraction (OpenIE) to generate synthetic training questions from paraphrased passages and uses the question-answer pairs as training data for a language model for a state-of-the-art QA system based on BERT. Triples in the form of <subject, predicate, object> are extracted from each passage, and questions are formed with subjects (or objects) and predicates while objects (or subjects) are considered as answers. Experimenting on five extractive QA datasets demonstrates that our technique achieves on-par performance with existing state-of-the-art QA systems with the benefit of being trained on an order of magnitude fewer documents and without any recourse to external reference data sources.
translated by 谷歌翻译
Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.
translated by 谷歌翻译
Knowledge graph embedding (KGE), which maps entities and relations in a knowledge graph into continuous vector spaces, has achieved great success in predicting missing links in knowledge graphs. However, knowledge graphs often contain incomplete triples that are difficult to inductively infer by KGEs. To address this challenge, we resort to analogical inference and propose a novel and general self-supervised framework AnKGE to enhance KGE models with analogical inference capability. We propose an analogical object retriever that retrieves appropriate analogical objects from entity-level, relation-level, and triple-level. And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding. In order to combine inductive inference capability from the original KGE model and analogical inference capability enhanced by AnKGE, we interpolate the analogy score with the base model score and introduce the adaptive weights in the score function for prediction. Through extensive experiments on FB15k-237 and WN18RR datasets, we show that AnKGE achieves competitive results on link prediction task and well performs analogical inference.
translated by 谷歌翻译